Overview

Dataset statistics

Number of variables23
Number of observations1296675
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory227.5 MiB
Average record size in memory184.0 B

Variable types

Numeric10
Categorical13

Warnings

trans_date_trans_time has a high cardinality: 1274791 distinct values High cardinality
merchant has a high cardinality: 693 distinct values High cardinality
first has a high cardinality: 352 distinct values High cardinality
last has a high cardinality: 481 distinct values High cardinality
street has a high cardinality: 983 distinct values High cardinality
city has a high cardinality: 894 distinct values High cardinality
state has a high cardinality: 51 distinct values High cardinality
job has a high cardinality: 494 distinct values High cardinality
dob has a high cardinality: 968 distinct values High cardinality
trans_num has a high cardinality: 1296675 distinct values High cardinality
Unnamed: 0 is highly correlated with unix_timeHigh correlation
zip is highly correlated with long and 1 other fieldsHigh correlation
lat is highly correlated with merch_latHigh correlation
long is highly correlated with zip and 1 other fieldsHigh correlation
unix_time is highly correlated with Unnamed: 0High correlation
merch_lat is highly correlated with latHigh correlation
merch_long is highly correlated with zip and 1 other fieldsHigh correlation
Unnamed: 0 is highly correlated with unix_timeHigh correlation
zip is highly correlated with long and 1 other fieldsHigh correlation
lat is highly correlated with merch_latHigh correlation
long is highly correlated with zip and 1 other fieldsHigh correlation
unix_time is highly correlated with Unnamed: 0High correlation
merch_lat is highly correlated with latHigh correlation
merch_long is highly correlated with zip and 1 other fieldsHigh correlation
Unnamed: 0 is highly correlated with unix_timeHigh correlation
zip is highly correlated with long and 1 other fieldsHigh correlation
lat is highly correlated with merch_latHigh correlation
long is highly correlated with zip and 1 other fieldsHigh correlation
unix_time is highly correlated with Unnamed: 0High correlation
merch_lat is highly correlated with latHigh correlation
merch_long is highly correlated with zip and 1 other fieldsHigh correlation
unix_time is highly correlated with Unnamed: 0High correlation
city_pop is highly correlated with stateHigh correlation
lat is highly correlated with zip and 4 other fieldsHigh correlation
zip is highly correlated with lat and 4 other fieldsHigh correlation
merch_long is highly correlated with lat and 4 other fieldsHigh correlation
merch_lat is highly correlated with lat and 4 other fieldsHigh correlation
state is highly correlated with city_pop and 5 other fieldsHigh correlation
Unnamed: 0 is highly correlated with unix_timeHigh correlation
long is highly correlated with lat and 4 other fieldsHigh correlation
amt is highly skewed (γ1 = 42.27787379) Skewed
Unnamed: 0 is uniformly distributed Uniform
trans_date_trans_time is uniformly distributed Uniform
trans_num is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
trans_num has unique values Unique

Reproduction

Analysis started2021-09-15 16:57:33.249077
Analysis finished2021-09-15 17:01:32.965687
Duration3 minutes and 59.72 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct1296675
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean648337
Minimum0
Maximum1296674
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:33.114575image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile64833.7
Q1324168.5
median648337
Q3972505.5
95-th percentile1231840.3
Maximum1296674
Range1296674
Interquartile range (IQR)648337

Descriptive statistics

Standard deviation374317.9745
Coefficient of variation (CV)0.5773509371
Kurtosis-1.2
Mean648337
Median Absolute Deviation (MAD)324169
Skewness-5.169118883 × 10-15
Sum8.406823795 × 1011
Variance1.40113946 × 1011
MonotonicityStrictly increasing
2021-09-15T13:01:33.307592image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
4030541
 
< 0.1%
4583571
 
< 0.1%
4563081
 
< 0.1%
4460671
 
< 0.1%
4440181
 
< 0.1%
4501611
 
< 0.1%
4481121
 
< 0.1%
4051031
 
< 0.1%
4091971
 
< 0.1%
Other values (1296665)1296665
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
12966741
< 0.1%
12966731
< 0.1%
12966721
< 0.1%
12966711
< 0.1%
12966701
< 0.1%
12966691
< 0.1%
12966681
< 0.1%
12966671
< 0.1%
12966661
< 0.1%
12966651
< 0.1%

trans_date_trans_time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1274791
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
2020-06-01 01:37:47
 
4
2019-04-22 16:02:01
 
4
2020-06-02 12:47:07
 
4
2020-05-19 23:47:06
 
3
2019-10-06 18:26:55
 
3
Other values (1274786)
1296657 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters24636825
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1253218 ?
Unique (%)96.6%

Sample

1st row2019-01-01 00:00:18
2nd row2019-01-01 00:00:44
3rd row2019-01-01 00:00:51
4th row2019-01-01 00:01:16
5th row2019-01-01 00:03:06

Common Values

ValueCountFrequency (%)
2020-06-01 01:37:474
 
< 0.1%
2019-04-22 16:02:014
 
< 0.1%
2020-06-02 12:47:074
 
< 0.1%
2020-05-19 23:47:063
 
< 0.1%
2019-10-06 18:26:553
 
< 0.1%
2019-12-02 13:41:293
 
< 0.1%
2019-06-09 23:02:073
 
< 0.1%
2019-12-29 16:39:183
 
< 0.1%
2019-05-12 15:01:563
 
< 0.1%
2019-08-04 09:39:543
 
< 0.1%
Other values (1274781)1296642
> 99.9%

Length

2021-09-15T13:01:33.746982image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-12-086428
 
0.2%
2019-12-156425
 
0.2%
2019-12-226325
 
0.2%
2019-12-296320
 
0.2%
2019-12-016283
 
0.2%
2019-12-096252
 
0.2%
2019-12-026150
 
0.2%
2019-12-166127
 
0.2%
2019-12-306064
 
0.2%
2019-12-235937
 
0.2%
Other values (86927)2531039
97.6%

Most occurring characters

ValueCountFrequency (%)
04537200
18.4%
23577846
14.5%
13411517
13.8%
-2593350
10.5%
:2593350
10.5%
91488122
 
6.0%
1296675
 
5.3%
31201556
 
4.9%
51073153
 
4.4%
41060414
 
4.3%
Other values (3)1803642
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number18153450
73.7%
Dash Punctuation2593350
 
10.5%
Other Punctuation2593350
 
10.5%
Space Separator1296675
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04537200
25.0%
23577846
19.7%
13411517
18.8%
91488122
 
8.2%
31201556
 
6.6%
51073153
 
5.9%
41060414
 
5.8%
6637897
 
3.5%
8585293
 
3.2%
7580452
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
-2593350
100.0%
Space Separator
ValueCountFrequency (%)
1296675
100.0%
Other Punctuation
ValueCountFrequency (%)
:2593350
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common24636825
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04537200
18.4%
23577846
14.5%
13411517
13.8%
-2593350
10.5%
:2593350
10.5%
91488122
 
6.0%
1296675
 
5.3%
31201556
 
4.9%
51073153
 
4.4%
41060414
 
4.3%
Other values (3)1803642
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII24636825
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04537200
18.4%
23577846
14.5%
13411517
13.8%
-2593350
10.5%
:2593350
10.5%
91488122
 
6.0%
1296675
 
5.3%
31201556
 
4.9%
51073153
 
4.4%
41060414
 
4.3%
Other values (3)1803642
 
7.3%

cc_num
Real number (ℝ≥0)

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.171920421 × 1017
Minimum6.041620718 × 1010
Maximum4.992346398 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:33.901567image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum6.041620718 × 1010
5-th percentile6.304848798 × 1011
Q11.800429465 × 1014
median3.521417321 × 1015
Q34.642255475 × 1015
95-th percentile4.497913966 × 1018
Maximum4.992346398 × 1018
Range4.992346338 × 1018
Interquartile range (IQR)4.462212529 × 1015

Descriptive statistics

Standard deviation1.308806447 × 1018
Coefficient of variation (CV)3.1371798
Kurtosis6.179949935
Mean4.171920421 × 1017
Median Absolute Deviation (MAD)3.076470873 × 1015
Skewness2.851879006
Sum-6.725541877 × 1018
Variance1.712974316 × 1036
MonotonicityNot monotonic
2021-09-15T13:01:34.078111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.713652351 × 10113123
 
0.2%
4.512828415 × 10183123
 
0.2%
3.672269902 × 10133119
 
0.2%
2.131124026 × 10143117
 
0.2%
3.54510934 × 10153113
 
0.2%
6.534628261 × 10153112
 
0.2%
6.011367958 × 10153110
 
0.2%
2.720433096 × 10153107
 
0.2%
6.011438889 × 10153106
 
0.2%
6.011109737 × 10153101
 
0.2%
Other values (973)1265544
97.6%
ValueCountFrequency (%)
6.041620718 × 10101518
0.1%
6.042292873 × 10101531
0.1%
6.042309813 × 1010510
 
< 0.1%
6.042785159 × 1010528
 
< 0.1%
6.048700208 × 1010496
 
< 0.1%
6.049059630 × 10101010
0.1%
6.049559311 × 1010518
 
< 0.1%
5.018029536 × 10111559
0.1%
5.018181333 × 10118
 
< 0.1%
5.018282048 × 1011515
 
< 0.1%
ValueCountFrequency (%)
4.992346398 × 10182059
0.2%
4.989847571 × 10181007
 
0.1%
4.980323468 × 1018532
 
< 0.1%
4.973530368 × 10181040
0.1%
4.958589672 × 10181476
0.1%
4.95682899 × 10182566
0.2%
4.911818931 × 10189
 
< 0.1%
4.906628656 × 10182584
0.2%
4.897067971 × 10181038
0.1%
4.890424427 × 10181496
0.1%

merchant
Categorical

HIGH CARDINALITY

Distinct693
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
fraud_Kilback LLC
 
4403
fraud_Cormier LLC
 
3649
fraud_Schumm PLC
 
3634
fraud_Kuhn LLC
 
3510
fraud_Boyer PLC
 
3493
Other values (688)
1277986 

Length

Max length43
Median length20
Mean length23.13259683
Min length13

Characters and Unicode

Total characters29995460
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfraud_Rippin, Kub and Mann
2nd rowfraud_Heller, Gutmann and Zieme
3rd rowfraud_Lind-Buckridge
4th rowfraud_Kutch, Hermiston and Farrell
5th rowfraud_Keeling-Crist

Common Values

ValueCountFrequency (%)
fraud_Kilback LLC4403
 
0.3%
fraud_Cormier LLC3649
 
0.3%
fraud_Schumm PLC3634
 
0.3%
fraud_Kuhn LLC3510
 
0.3%
fraud_Boyer PLC3493
 
0.3%
fraud_Dickinson Ltd3434
 
0.3%
fraud_Cummerata-Jones2736
 
0.2%
fraud_Kutch LLC2734
 
0.2%
fraud_Olson, Becker and Koch2723
 
0.2%
fraud_Stroman, Hudson and Erdman2721
 
0.2%
Other values (683)1263638
97.5%

Length

2021-09-15T13:01:34.488347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and474111
 
15.7%
llc97780
 
3.2%
inc91939
 
3.0%
sons73145
 
2.4%
ltd70853
 
2.3%
plc66475
 
2.2%
group50447
 
1.7%
fraud_kutch10560
 
0.3%
fraud_schaefer9394
 
0.3%
fraud_streich9250
 
0.3%
Other values (804)2069403
68.4%

Most occurring characters

ValueCountFrequency (%)
a2910697
 
9.7%
r2695758
 
9.0%
d2139780
 
7.1%
e1865710
 
6.2%
u1857912
 
6.2%
n1768848
 
5.9%
1726682
 
5.8%
f1397378
 
4.7%
_1296675
 
4.3%
o1129340
 
3.8%
Other values (45)11206680
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter22698472
75.7%
Uppercase Letter3398527
 
11.3%
Space Separator1726682
 
5.8%
Connector Punctuation1296675
 
4.3%
Dash Punctuation445070
 
1.5%
Other Punctuation430034
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2910697
12.8%
r2695758
11.9%
d2139780
9.4%
e1865710
 
8.2%
u1857912
 
8.2%
n1768848
 
7.8%
f1397378
 
6.2%
o1129340
 
5.0%
i1080395
 
4.8%
t873637
 
3.8%
Other values (15)4979017
21.9%
Uppercase Letter
ValueCountFrequency (%)
L477174
14.0%
C312176
 
9.2%
S301639
 
8.9%
B278515
 
8.2%
H260640
 
7.7%
K216627
 
6.4%
G192442
 
5.7%
R181447
 
5.3%
M179139
 
5.3%
P159738
 
4.7%
Other values (15)838990
24.7%
Other Punctuation
ValueCountFrequency (%)
,400966
93.2%
'29068
 
6.8%
Connector Punctuation
ValueCountFrequency (%)
_1296675
100.0%
Space Separator
ValueCountFrequency (%)
1726682
100.0%
Dash Punctuation
ValueCountFrequency (%)
-445070
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26096999
87.0%
Common3898461
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2910697
 
11.2%
r2695758
 
10.3%
d2139780
 
8.2%
e1865710
 
7.1%
u1857912
 
7.1%
n1768848
 
6.8%
f1397378
 
5.4%
o1129340
 
4.3%
i1080395
 
4.1%
t873637
 
3.3%
Other values (40)8377544
32.1%
Common
ValueCountFrequency (%)
1726682
44.3%
_1296675
33.3%
-445070
 
11.4%
,400966
 
10.3%
'29068
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII29995460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2910697
 
9.7%
r2695758
 
9.0%
d2139780
 
7.1%
e1865710
 
6.2%
u1857912
 
6.2%
n1768848
 
5.9%
1726682
 
5.8%
f1397378
 
4.7%
_1296675
 
4.3%
o1129340
 
3.8%
Other values (45)11206680
37.4%

category
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
gas_transport
131659 
grocery_pos
123638 
home
123115 
shopping_pos
116672 
kids_pets
113035 
Other values (9)
688556 

Length

Max length14
Median length11
Mean length10.52607862
Min length4

Characters and Unicode

Total characters13648903
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmisc_net
2nd rowgrocery_pos
3rd rowentertainment
4th rowgas_transport
5th rowmisc_pos

Common Values

ValueCountFrequency (%)
gas_transport131659
10.2%
grocery_pos123638
9.5%
home123115
9.5%
shopping_pos116672
9.0%
kids_pets113035
8.7%
shopping_net97543
7.5%
entertainment94014
7.3%
food_dining91461
 
7.1%
personal_care90758
 
7.0%
health_fitness85879
 
6.6%
Other values (4)228901
17.7%

Length

2021-09-15T13:01:34.840411image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gas_transport131659
10.2%
grocery_pos123638
9.5%
home123115
9.5%
shopping_pos116672
9.0%
kids_pets113035
8.7%
shopping_net97543
7.5%
entertainment94014
7.3%
food_dining91461
 
7.1%
personal_care90758
 
7.0%
health_fitness85879
 
6.6%
Other values (4)228901
17.7%

Most occurring characters

ValueCountFrequency (%)
s1429026
10.5%
e1287345
9.4%
o1231724
9.0%
n1193757
8.7%
p1083847
 
7.9%
t1076942
 
7.9%
_1039039
 
7.6%
r917535
 
6.7%
i833007
 
6.1%
a665234
 
4.9%
Other values (10)2891447
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12609864
92.4%
Connector Punctuation1039039
 
7.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s1429026
11.3%
e1287345
10.2%
o1231724
9.8%
n1193757
9.5%
p1083847
8.6%
t1076942
8.5%
r917535
7.3%
i833007
 
6.6%
a665234
 
5.3%
g606425
 
4.8%
Other values (9)2285022
18.1%
Connector Punctuation
ValueCountFrequency (%)
_1039039
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12609864
92.4%
Common1039039
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
s1429026
11.3%
e1287345
10.2%
o1231724
9.8%
n1193757
9.5%
p1083847
8.6%
t1076942
8.5%
r917535
7.3%
i833007
 
6.6%
a665234
 
5.3%
g606425
 
4.8%
Other values (9)2285022
18.1%
Common
ValueCountFrequency (%)
_1039039
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII13648903
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s1429026
10.5%
e1287345
9.4%
o1231724
9.0%
n1193757
8.7%
p1083847
 
7.9%
t1076942
 
7.9%
_1039039
 
7.6%
r917535
 
6.7%
i833007
 
6.1%
a665234
 
4.9%
Other values (10)2891447
21.2%

amt
Real number (ℝ≥0)

SKEWED

Distinct52928
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.35103546
Minimum1
Maximum28948.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:35.001977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.44
Q19.65
median47.52
Q383.14
95-th percentile196.31
Maximum28948.9
Range28947.9
Interquartile range (IQR)73.49

Descriptive statistics

Standard deviation160.3160386
Coefficient of variation (CV)2.278801407
Kurtosis4545.644979
Mean70.35103546
Median Absolute Deviation (MAD)37.5
Skewness42.27787379
Sum91222428.9
Variance25701.23222
MonotonicityNot monotonic
2021-09-15T13:01:35.343088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.14542
 
< 0.1%
1.04538
 
< 0.1%
1.25535
 
< 0.1%
1.02533
 
< 0.1%
1.01523
 
< 0.1%
1.05519
 
< 0.1%
1.2516
 
< 0.1%
1.23515
 
< 0.1%
1.08512
 
< 0.1%
1.11509
 
< 0.1%
Other values (52918)1291433
99.6%
ValueCountFrequency (%)
1222
< 0.1%
1.01523
< 0.1%
1.02533
< 0.1%
1.03499
< 0.1%
1.04538
< 0.1%
1.05519
< 0.1%
1.06471
< 0.1%
1.07498
< 0.1%
1.08512
< 0.1%
1.09496
< 0.1%
ValueCountFrequency (%)
28948.91
< 0.1%
27390.121
< 0.1%
27119.771
< 0.1%
26544.121
< 0.1%
25086.941
< 0.1%
17897.241
< 0.1%
15305.951
< 0.1%
15047.031
< 0.1%
15034.181
< 0.1%
14849.741
< 0.1%

first
Categorical

HIGH CARDINALITY

Distinct352
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
Christopher
 
26669
Robert
 
21667
Jessica
 
20581
James
 
20039
Michael
 
20009
Other values (347)
1187710 

Length

Max length11
Median length6
Mean length6.080431874
Min length3

Characters and Unicode

Total characters7884344
Distinct characters49
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJennifer
2nd rowStephanie
3rd rowEdward
4th rowJeremy
5th rowTyler

Common Values

ValueCountFrequency (%)
Christopher26669
 
2.1%
Robert21667
 
1.7%
Jessica20581
 
1.6%
James20039
 
1.5%
Michael20009
 
1.5%
David19965
 
1.5%
Jennifer16940
 
1.3%
William16371
 
1.3%
Mary16346
 
1.3%
John16325
 
1.3%
Other values (342)1101763
85.0%

Length

2021-09-15T13:01:35.757318image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
christopher26669
 
2.1%
robert21667
 
1.7%
jessica20581
 
1.6%
james20039
 
1.5%
michael20009
 
1.5%
david19965
 
1.5%
jennifer16940
 
1.3%
william16371
 
1.3%
mary16346
 
1.3%
john16325
 
1.3%
Other values (342)1101763
85.0%

Most occurring characters

ValueCountFrequency (%)
a1007700
 
12.8%
e860878
 
10.9%
i618247
 
7.8%
n614453
 
7.8%
r607072
 
7.7%
l388220
 
4.9%
h344993
 
4.4%
s324237
 
4.1%
t311569
 
4.0%
o268849
 
3.4%
Other values (39)2538126
32.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6587669
83.6%
Uppercase Letter1296675
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1007700
15.3%
e860878
13.1%
i618247
9.4%
n614453
9.3%
r607072
9.2%
l388220
 
5.9%
h344993
 
5.2%
s324237
 
4.9%
t311569
 
4.7%
o268849
 
4.1%
Other values (16)1241451
18.8%
Uppercase Letter
ValueCountFrequency (%)
J218907
16.9%
M144916
11.2%
S114469
8.8%
A112464
8.7%
C106121
8.2%
D86078
 
6.6%
K85426
 
6.6%
R70457
 
5.4%
T66590
 
5.1%
L62879
 
4.8%
Other values (13)228368
17.6%

Most occurring scripts

ValueCountFrequency (%)
Latin7884344
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1007700
 
12.8%
e860878
 
10.9%
i618247
 
7.8%
n614453
 
7.8%
r607072
 
7.7%
l388220
 
4.9%
h344993
 
4.4%
s324237
 
4.1%
t311569
 
4.0%
o268849
 
3.4%
Other values (39)2538126
32.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7884344
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1007700
 
12.8%
e860878
 
10.9%
i618247
 
7.8%
n614453
 
7.8%
r607072
 
7.7%
l388220
 
4.9%
h344993
 
4.4%
s324237
 
4.1%
t311569
 
4.0%
o268849
 
3.4%
Other values (39)2538126
32.2%

last
Categorical

HIGH CARDINALITY

Distinct481
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
Smith
 
28794
Williams
 
23605
Davis
 
21910
Johnson
 
20034
Rodriguez
 
17394
Other values (476)
1184938 

Length

Max length11
Median length6
Mean length6.111177435
Min length2

Characters and Unicode

Total characters7924211
Distinct characters48
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBanks
2nd rowGill
3rd rowSanchez
4th rowWhite
5th rowGarcia

Common Values

ValueCountFrequency (%)
Smith28794
 
2.2%
Williams23605
 
1.8%
Davis21910
 
1.7%
Johnson20034
 
1.5%
Rodriguez17394
 
1.3%
Martinez14805
 
1.1%
Jones13976
 
1.1%
Lewis12753
 
1.0%
Gonzalez11799
 
0.9%
Miller11698
 
0.9%
Other values (471)1119907
86.4%

Length

2021-09-15T13:01:36.146464image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
smith28794
 
2.2%
williams23605
 
1.8%
davis21910
 
1.7%
johnson20034
 
1.5%
rodriguez17394
 
1.3%
martinez14805
 
1.1%
jones13976
 
1.1%
lewis12753
 
1.0%
gonzalez11799
 
0.9%
miller11698
 
0.9%
Other values (471)1119907
86.4%

Most occurring characters

ValueCountFrequency (%)
e786302
 
9.9%
r658748
 
8.3%
a648005
 
8.2%
n609178
 
7.7%
o583517
 
7.4%
l489180
 
6.2%
s487668
 
6.2%
i435378
 
5.5%
t288591
 
3.6%
h228981
 
2.9%
Other values (38)2708663
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6627536
83.6%
Uppercase Letter1296675
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e786302
11.9%
r658748
9.9%
a648005
9.8%
n609178
9.2%
o583517
8.8%
l489180
 
7.4%
s487668
 
7.4%
i435378
 
6.6%
t288591
 
4.4%
h228981
 
3.5%
Other values (15)1411988
21.3%
Uppercase Letter
ValueCountFrequency (%)
M158701
12.2%
W106490
 
8.2%
S105221
 
8.1%
C93308
 
7.2%
B84092
 
6.5%
R83194
 
6.4%
H81444
 
6.3%
G75241
 
5.8%
J71781
 
5.5%
P66087
 
5.1%
Other values (13)371116
28.6%

Most occurring scripts

ValueCountFrequency (%)
Latin7924211
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e786302
 
9.9%
r658748
 
8.3%
a648005
 
8.2%
n609178
 
7.7%
o583517
 
7.4%
l489180
 
6.2%
s487668
 
6.2%
i435378
 
5.5%
t288591
 
3.6%
h228981
 
2.9%
Other values (38)2708663
34.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7924211
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e786302
 
9.9%
r658748
 
8.3%
a648005
 
8.2%
n609178
 
7.7%
o583517
 
7.4%
l489180
 
6.2%
s487668
 
6.2%
i435378
 
5.5%
t288591
 
3.6%
h228981
 
2.9%
Other values (38)2708663
34.2%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
F
709863 
M
586812 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1296675
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F709863
54.7%
M586812
45.3%

Length

2021-09-15T13:01:36.426434image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-15T13:01:36.554122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
f709863
54.7%
m586812
45.3%

Most occurring characters

ValueCountFrequency (%)
F709863
54.7%
M586812
45.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1296675
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F709863
54.7%
M586812
45.3%

Most occurring scripts

ValueCountFrequency (%)
Latin1296675
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F709863
54.7%
M586812
45.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1296675
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F709863
54.7%
M586812
45.3%

street
Categorical

HIGH CARDINALITY

Distinct983
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
864 Reynolds Plains
 
3123
0069 Robin Brooks Apt. 695
 
3123
8172 Robertson Parkways Suite 072
 
3119
4664 Sanchez Common Suite 930
 
3117
8030 Beck Motorway
 
3113
Other values (978)
1281080 

Length

Max length35
Median length22
Mean length22.22902655
Min length12

Characters and Unicode

Total characters28823823
Distinct characters62
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row561 Perry Cove
2nd row43039 Riley Greens Suite 393
3rd row594 White Dale Suite 530
4th row9443 Cynthia Court Apt. 038
5th row408 Bradley Rest

Common Values

ValueCountFrequency (%)
864 Reynolds Plains3123
 
0.2%
0069 Robin Brooks Apt. 6953123
 
0.2%
8172 Robertson Parkways Suite 0723119
 
0.2%
4664 Sanchez Common Suite 9303117
 
0.2%
8030 Beck Motorway3113
 
0.2%
29606 Martinez Views Suite 6533112
 
0.2%
1652 James Mews3110
 
0.2%
854 Walker Dale Suite 4883107
 
0.2%
40624 Rebecca Spurs3106
 
0.2%
594 Berry Lights Apt. 3923101
 
0.2%
Other values (973)1265544
97.6%

Length

2021-09-15T13:01:37.051601image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
apt327791
 
6.4%
suite305467
 
5.9%
island22954
 
0.4%
michael18967
 
0.4%
common17978
 
0.3%
station17957
 
0.3%
islands17917
 
0.3%
david17476
 
0.3%
brooks16991
 
0.3%
fields16321
 
0.3%
Other values (1940)4376722
84.9%

Most occurring characters

ValueCountFrequency (%)
3859866
 
13.4%
e1792676
 
6.2%
a1454190
 
5.0%
i1296969
 
4.5%
t1248091
 
4.3%
r1103208
 
3.8%
n1066149
 
3.7%
s1034564
 
3.6%
l889594
 
3.1%
o875571
 
3.0%
Other values (52)14202945
49.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14413030
50.0%
Decimal Number6996528
24.3%
Space Separator3859866
 
13.4%
Uppercase Letter3226608
 
11.2%
Other Punctuation327791
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1792676
12.4%
a1454190
10.1%
i1296969
9.0%
t1248091
8.7%
r1103208
 
7.7%
n1066149
 
7.4%
s1034564
 
7.2%
l889594
 
6.2%
o875571
 
6.1%
u613916
 
4.3%
Other values (16)3038102
21.1%
Uppercase Letter
ValueCountFrequency (%)
S561924
17.4%
A421707
13.1%
M258180
 
8.0%
C223839
 
6.9%
P195864
 
6.1%
R186303
 
5.8%
B148676
 
4.6%
F143149
 
4.4%
L131665
 
4.1%
J121164
 
3.8%
Other values (14)834137
25.9%
Decimal Number
ValueCountFrequency (%)
5748812
10.7%
3739928
10.6%
2734719
10.5%
7703124
10.0%
1693880
9.9%
8692585
9.9%
6677709
9.7%
0677245
9.7%
4669799
9.6%
9658727
9.4%
Space Separator
ValueCountFrequency (%)
3859866
100.0%
Other Punctuation
ValueCountFrequency (%)
.327791
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17639638
61.2%
Common11184185
38.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1792676
 
10.2%
a1454190
 
8.2%
i1296969
 
7.4%
t1248091
 
7.1%
r1103208
 
6.3%
n1066149
 
6.0%
s1034564
 
5.9%
l889594
 
5.0%
o875571
 
5.0%
u613916
 
3.5%
Other values (40)6264710
35.5%
Common
ValueCountFrequency (%)
3859866
34.5%
5748812
 
6.7%
3739928
 
6.6%
2734719
 
6.6%
7703124
 
6.3%
1693880
 
6.2%
8692585
 
6.2%
6677709
 
6.1%
0677245
 
6.1%
4669799
 
6.0%
Other values (2)986518
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII28823823
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3859866
 
13.4%
e1792676
 
6.2%
a1454190
 
5.0%
i1296969
 
4.5%
t1248091
 
4.3%
r1103208
 
3.8%
n1066149
 
3.7%
s1034564
 
3.6%
l889594
 
3.1%
o875571
 
3.0%
Other values (52)14202945
49.3%

city
Categorical

HIGH CARDINALITY

Distinct894
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
Birmingham
 
5617
San Antonio
 
5130
Utica
 
5105
Phoenix
 
5075
Meridian
 
5060
Other values (889)
1270688 

Length

Max length25
Median length8
Mean length8.652245937
Min length3

Characters and Unicode

Total characters11219151
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMoravian Falls
2nd rowOrient
3rd rowMalad City
4th rowBoulder
5th rowDoe Hill

Common Values

ValueCountFrequency (%)
Birmingham5617
 
0.4%
San Antonio5130
 
0.4%
Utica5105
 
0.4%
Phoenix5075
 
0.4%
Meridian5060
 
0.4%
Thomas4634
 
0.4%
Conway4613
 
0.4%
Cleveland4604
 
0.4%
Warren4599
 
0.4%
Houston4168
 
0.3%
Other values (884)1248070
96.3%

Length

2021-09-15T13:01:37.399504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city21314
 
1.3%
west19473
 
1.2%
north14425
 
0.9%
saint14363
 
0.9%
falls12794
 
0.8%
new11842
 
0.7%
mount11375
 
0.7%
lake11249
 
0.7%
san10260
 
0.6%
springs8727
 
0.5%
Other values (918)1482445
91.6%

Most occurring characters

ValueCountFrequency (%)
e1090254
 
9.7%
a935089
 
8.3%
n821831
 
7.3%
o817806
 
7.3%
l781662
 
7.0%
r748921
 
6.7%
i704285
 
6.3%
t598490
 
5.3%
s446306
 
4.0%
321592
 
2.9%
Other values (42)3952915
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9277246
82.7%
Uppercase Letter1619290
 
14.4%
Space Separator321592
 
2.9%
Dash Punctuation1023
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C156587
 
9.7%
M147711
 
9.1%
S136036
 
8.4%
B133396
 
8.2%
H115641
 
7.1%
W95433
 
5.9%
P92084
 
5.7%
L86511
 
5.3%
R79150
 
4.9%
A74999
 
4.6%
Other values (15)501742
31.0%
Lowercase Letter
ValueCountFrequency (%)
e1090254
11.8%
a935089
10.1%
n821831
8.9%
o817806
8.8%
l781662
 
8.4%
r748921
 
8.1%
i704285
 
7.6%
t598490
 
6.5%
s446306
 
4.8%
d309005
 
3.3%
Other values (15)2023597
21.8%
Space Separator
ValueCountFrequency (%)
321592
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1023
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10896536
97.1%
Common322615
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1090254
 
10.0%
a935089
 
8.6%
n821831
 
7.5%
o817806
 
7.5%
l781662
 
7.2%
r748921
 
6.9%
i704285
 
6.5%
t598490
 
5.5%
s446306
 
4.1%
d309005
 
2.8%
Other values (40)3642887
33.4%
Common
ValueCountFrequency (%)
321592
99.7%
-1023
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII11219151
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1090254
 
9.7%
a935089
 
8.3%
n821831
 
7.3%
o817806
 
7.3%
l781662
 
7.0%
r748921
 
6.7%
i704285
 
6.3%
t598490
 
5.3%
s446306
 
4.0%
321592
 
2.9%
Other values (42)3952915
35.2%

state
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
TX
94876 
NY
 
83501
PA
 
79847
CA
 
56360
OH
 
46480
Other values (46)
935611 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2593350
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNC
2nd rowWA
3rd rowID
4th rowMT
5th rowVA

Common Values

ValueCountFrequency (%)
TX94876
 
7.3%
NY83501
 
6.4%
PA79847
 
6.2%
CA56360
 
4.3%
OH46480
 
3.6%
MI46154
 
3.6%
IL43252
 
3.3%
FL42671
 
3.3%
AL40989
 
3.2%
MO38403
 
3.0%
Other values (41)724142
55.8%

Length

2021-09-15T13:01:37.742118image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tx94876
 
7.3%
ny83501
 
6.4%
pa79847
 
6.2%
ca56360
 
4.3%
oh46480
 
3.6%
mi46154
 
3.6%
il43252
 
3.3%
fl42671
 
3.3%
al40989
 
3.2%
mo38403
 
3.0%
Other values (41)724142
55.8%

Most occurring characters

ValueCountFrequency (%)
A355776
13.7%
N284464
 
11.0%
M220694
 
8.5%
I181993
 
7.0%
T154353
 
6.0%
L147877
 
5.7%
O144031
 
5.6%
C141011
 
5.4%
Y131298
 
5.1%
X94876
 
3.7%
Other values (14)736977
28.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2593350
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A355776
13.7%
N284464
 
11.0%
M220694
 
8.5%
I181993
 
7.0%
T154353
 
6.0%
L147877
 
5.7%
O144031
 
5.6%
C141011
 
5.4%
Y131298
 
5.1%
X94876
 
3.7%
Other values (14)736977
28.4%

Most occurring scripts

ValueCountFrequency (%)
Latin2593350
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A355776
13.7%
N284464
 
11.0%
M220694
 
8.5%
I181993
 
7.0%
T154353
 
6.0%
L147877
 
5.7%
O144031
 
5.6%
C141011
 
5.4%
Y131298
 
5.1%
X94876
 
3.7%
Other values (14)736977
28.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2593350
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A355776
13.7%
N284464
 
11.0%
M220694
 
8.5%
I181993
 
7.0%
T154353
 
6.0%
L147877
 
5.7%
O144031
 
5.6%
C141011
 
5.4%
Y131298
 
5.1%
X94876
 
3.7%
Other values (14)736977
28.4%

zip
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct970
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48800.6711
Minimum1257
Maximum99783
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:37.889726image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1257
5-th percentile7208
Q126237
median48174
Q372042
95-th percentile94569
Maximum99783
Range98526
Interquartile range (IQR)45805

Descriptive statistics

Standard deviation26893.22248
Coefficient of variation (CV)0.551083046
Kurtosis-1.096449332
Mean48800.6711
Median Absolute Deviation (MAD)23068
Skewness0.07968075775
Sum6.32786102 × 1010
Variance723245415.2
MonotonicityNot monotonic
2021-09-15T13:01:38.070247image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
737543646
 
0.3%
341123613
 
0.3%
480883597
 
0.3%
825143527
 
0.3%
154843123
 
0.2%
496283123
 
0.2%
851733119
 
0.2%
298193117
 
0.2%
387613113
 
0.2%
54613112
 
0.2%
Other values (960)1263585
97.4%
ValueCountFrequency (%)
12572023
0.2%
13301031
 
0.1%
1535515
 
< 0.1%
15451024
 
0.1%
1612519
 
< 0.1%
18432597
0.2%
18442058
0.2%
2180519
 
< 0.1%
26302090
0.2%
2908550
 
< 0.1%
ValueCountFrequency (%)
997831568
0.1%
9974712
 
< 0.1%
99746540
 
< 0.1%
993232572
0.2%
991603030
0.2%
9911615
 
< 0.1%
991131047
 
0.1%
990332458
0.2%
98836524
 
< 0.1%
98665500
 
< 0.1%

lat
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct968
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.53762161
Minimum20.0271
Maximum66.6933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:38.237797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum20.0271
5-th percentile29.8826
Q134.6205
median39.3543
Q341.9404
95-th percentile45.8433
Maximum66.6933
Range46.6662
Interquartile range (IQR)7.3199

Descriptive statistics

Standard deviation5.075808439
Coefficient of variation (CV)0.1317104748
Kurtosis0.8129679455
Mean38.53762161
Median Absolute Deviation (MAD)3.3597
Skewness-0.1860276801
Sum49970770.51
Variance25.76383131
MonotonicityNot monotonic
2021-09-15T13:01:38.413005image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36.3853646
 
0.3%
26.11843613
 
0.3%
42.51643597
 
0.3%
43.00483527
 
0.3%
39.89363123
 
0.2%
44.59953123
 
0.2%
33.28873119
 
0.2%
34.03263117
 
0.2%
33.47833113
 
0.2%
44.33463112
 
0.2%
Other values (958)1263585
97.4%
ValueCountFrequency (%)
20.02711527
0.1%
20.08271032
 
0.1%
24.65572584
0.2%
26.11843613
0.3%
26.3304542
 
< 0.1%
26.3771518
 
< 0.1%
26.42153038
0.2%
26.47222524
0.2%
26.5291549
0.1%
26.69391027
 
0.1%
ValueCountFrequency (%)
66.693312
 
< 0.1%
65.6899540
 
< 0.1%
64.75561568
0.1%
48.88783030
0.2%
48.88562066
0.2%
48.83281533
0.1%
48.66691047
 
0.1%
48.60312973
0.2%
48.47862038
0.2%
48.343088
0.2%

long
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct969
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.22633538
Minimum-165.6723
Maximum-67.9503
Zeros0
Zeros (%)0.0%
Negative1296675
Negative (%)100.0%
Memory size9.9 MiB
2021-09-15T13:01:38.577567image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-165.6723
5-th percentile-119.0825
Q1-96.798
median-87.4769
Q3-80.158
95-th percentile-73.5112
Maximum-67.9503
Range97.722
Interquartile range (IQR)16.64

Descriptive statistics

Standard deviation13.75907695
Coefficient of variation (CV)-0.1524951323
Kurtosis1.855892285
Mean-90.22633538
Median Absolute Deviation (MAD)8.1527
Skewness-1.150107737
Sum-116994233.4
Variance189.3121984
MonotonicityNot monotonic
2021-09-15T13:01:38.741334image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-98.07273646
 
0.3%
-81.73613613
 
0.3%
-82.98323597
 
0.3%
-108.89643527
 
0.3%
-86.21413123
 
0.2%
-79.78563123
 
0.2%
-111.09853119
 
0.2%
-82.20273117
 
0.2%
-90.51423113
 
0.2%
-73.0983112
 
0.2%
Other values (959)1263585
97.4%
ValueCountFrequency (%)
-165.67231568
0.1%
-156.292540
 
< 0.1%
-155.4881032
0.1%
-155.36971527
0.1%
-153.99412
 
< 0.1%
-124.44091043
0.1%
-124.21741547
0.1%
-124.15871031
0.1%
-124.14371526
0.1%
-123.97432036
0.2%
ValueCountFrequency (%)
-67.95032080
0.2%
-68.55651014
 
0.1%
-69.2675519
 
< 0.1%
-69.48282050
0.2%
-69.9576537
 
< 0.1%
-69.96563107
0.2%
-70.10319
 
< 0.1%
-70.2391036
 
0.1%
-70.30012090
0.2%
-70.34571527
0.1%

city_pop
Real number (ℝ≥0)

HIGH CORRELATION

Distinct879
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88824.44056
Minimum23
Maximum2906700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:38.900448image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile139
Q1743
median2456
Q320328
95-th percentile525713
Maximum2906700
Range2906677
Interquartile range (IQR)19585

Descriptive statistics

Standard deviation301956.3607
Coefficient of variation (CV)3.399473825
Kurtosis37.6145193
Mean88824.44056
Median Absolute Deviation (MAD)2198
Skewness5.593853067
Sum1.151764315 × 1011
Variance9.117764376 × 1010
MonotonicityNot monotonic
2021-09-15T13:01:39.068034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6065496
 
0.4%
15957975130
 
0.4%
13129225075
 
0.4%
17664574
 
0.4%
2414533
 
0.3%
29067004168
 
0.3%
2760024155
 
0.3%
3024147
 
0.3%
9101484073
 
0.3%
1984067
 
0.3%
Other values (869)1251257
96.5%
ValueCountFrequency (%)
232049
0.2%
371013
 
0.1%
432034
0.2%
463040
0.2%
47511
 
< 0.1%
491054
 
0.1%
511016
 
0.1%
52518
 
< 0.1%
532610
0.2%
601045
 
0.1%
ValueCountFrequency (%)
29067004168
0.3%
25047002033
 
0.2%
2383912521
 
< 0.1%
15957975130
0.4%
15773852563
0.2%
15262063517
0.3%
14177938
 
< 0.1%
13824802056
0.2%
13129225075
0.4%
12633213629
0.3%

job
Categorical

HIGH CARDINALITY

Distinct494
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
Film/video editor
 
9779
Exhibition designer
 
9199
Naval architect
 
8684
Surveyor, land/geomatics
 
8680
Materials engineer
 
8270
Other values (489)
1252063 

Length

Max length59
Median length19
Mean length20.2271024
Min length3

Characters and Unicode

Total characters26227978
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPsychologist, counselling
2nd rowSpecial educational needs teacher
3rd rowNature conservation officer
4th rowPatent attorney
5th rowDance movement psychotherapist

Common Values

ValueCountFrequency (%)
Film/video editor9779
 
0.8%
Exhibition designer9199
 
0.7%
Naval architect8684
 
0.7%
Surveyor, land/geomatics8680
 
0.7%
Materials engineer8270
 
0.6%
Designer, ceramics/pottery8225
 
0.6%
Systems developer7700
 
0.6%
IT trainer7679
 
0.6%
Financial adviser7659
 
0.6%
Environmental consultant7547
 
0.6%
Other values (484)1213253
93.6%

Length

2021-09-15T13:01:39.457960image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer131756
 
4.6%
officer110915
 
3.9%
manager61124
 
2.1%
scientist55878
 
1.9%
designer52218
 
1.8%
surveyor49062
 
1.7%
teacher38126
 
1.3%
psychologist32600
 
1.1%
research29754
 
1.0%
editor28725
 
1.0%
Other values (456)2289024
79.5%

Most occurring characters

ValueCountFrequency (%)
e2803032
 
10.7%
i2386346
 
9.1%
r2198669
 
8.4%
a1813638
 
6.9%
t1782302
 
6.8%
n1764769
 
6.7%
1582507
 
6.0%
o1491775
 
5.7%
s1444701
 
5.5%
c1323152
 
5.0%
Other values (43)7637087
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter22784440
86.9%
Space Separator1582507
 
6.0%
Uppercase Letter1369269
 
5.2%
Other Punctuation443484
 
1.7%
Open Punctuation24139
 
0.1%
Close Punctuation24139
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2803032
12.3%
i2386346
10.5%
r2198669
9.6%
a1813638
 
8.0%
t1782302
 
7.8%
n1764769
 
7.7%
o1491775
 
6.5%
s1444701
 
6.3%
c1323152
 
5.8%
l999624
 
4.4%
Other values (16)4776432
21.0%
Uppercase Letter
ValueCountFrequency (%)
C156704
11.4%
E145426
10.6%
P143111
10.5%
S137500
10.0%
T113148
 
8.3%
M89545
 
6.5%
A88466
 
6.5%
F68651
 
5.0%
D58034
 
4.2%
R55841
 
4.1%
Other values (11)312843
22.8%
Other Punctuation
ValueCountFrequency (%)
,312210
70.4%
/123567
 
27.9%
'7707
 
1.7%
Space Separator
ValueCountFrequency (%)
1582507
100.0%
Open Punctuation
ValueCountFrequency (%)
(24139
100.0%
Close Punctuation
ValueCountFrequency (%)
)24139
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin24153709
92.1%
Common2074269
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2803032
11.6%
i2386346
 
9.9%
r2198669
 
9.1%
a1813638
 
7.5%
t1782302
 
7.4%
n1764769
 
7.3%
o1491775
 
6.2%
s1444701
 
6.0%
c1323152
 
5.5%
l999624
 
4.1%
Other values (37)6145701
25.4%
Common
ValueCountFrequency (%)
1582507
76.3%
,312210
 
15.1%
/123567
 
6.0%
(24139
 
1.2%
)24139
 
1.2%
'7707
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII26227978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2803032
 
10.7%
i2386346
 
9.1%
r2198669
 
8.4%
a1813638
 
6.9%
t1782302
 
6.8%
n1764769
 
6.7%
1582507
 
6.0%
o1491775
 
5.7%
s1444701
 
5.5%
c1323152
 
5.0%
Other values (43)7637087
29.1%

dob
Categorical

HIGH CARDINALITY

Distinct968
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
1977-03-23
 
5636
1981-08-29
 
4636
1988-09-15
 
4623
1955-05-06
 
3661
1983-07-25
 
3123
Other values (963)
1274996 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters12966750
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1988-03-09
2nd row1978-06-21
3rd row1962-01-19
4th row1967-01-12
5th row1986-03-28

Common Values

ValueCountFrequency (%)
1977-03-235636
 
0.4%
1981-08-294636
 
0.4%
1988-09-154623
 
0.4%
1955-05-063661
 
0.3%
1983-07-253123
 
0.2%
1995-07-123123
 
0.2%
1987-10-283119
 
0.2%
1984-06-033117
 
0.2%
1999-03-053113
 
0.2%
1998-03-193112
 
0.2%
Other values (958)1259412
97.1%

Length

2021-09-15T13:01:39.832048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1977-03-235636
 
0.4%
1981-08-294636
 
0.4%
1988-09-154623
 
0.4%
1955-05-063661
 
0.3%
1983-07-253123
 
0.2%
1995-07-123123
 
0.2%
1987-10-283119
 
0.2%
1984-06-033117
 
0.2%
1999-03-053113
 
0.2%
1998-03-193112
 
0.2%
Other values (958)1259412
97.1%

Most occurring characters

ValueCountFrequency (%)
-2593350
20.0%
12482923
19.1%
91846679
14.2%
01791076
13.8%
2903212
 
7.0%
7664815
 
5.1%
8645604
 
5.0%
6548041
 
4.2%
5536159
 
4.1%
3484324
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10373400
80.0%
Dash Punctuation2593350
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12482923
23.9%
91846679
17.8%
01791076
17.3%
2903212
 
8.7%
7664815
 
6.4%
8645604
 
6.2%
6548041
 
5.3%
5536159
 
5.2%
3484324
 
4.7%
4470567
 
4.5%
Dash Punctuation
ValueCountFrequency (%)
-2593350
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12966750
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
-2593350
20.0%
12482923
19.1%
91846679
14.2%
01791076
13.8%
2903212
 
7.0%
7664815
 
5.1%
8645604
 
5.0%
6548041
 
4.2%
5536159
 
4.1%
3484324
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII12966750
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-2593350
20.0%
12482923
19.1%
91846679
14.2%
01791076
13.8%
2903212
 
7.0%
7664815
 
5.1%
8645604
 
5.0%
6548041
 
4.2%
5536159
 
4.1%
3484324
 
3.7%

trans_num
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1296675
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
1d8659b241ea2a332a6b5fec6b954efd
 
1
d45f27d0e3ddbd615980f3e7b00cfede
 
1
ce96501e0654431b16afaa57e76dba88
 
1
e23492fd89c63e769103d46315d9981c
 
1
8d001c66a1fb40b9e1466fb2ee79efa4
 
1
Other values (1296670)
1296670 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters41493600
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1296675 ?
Unique (%)100.0%

Sample

1st row0b242abb623afc578575680df30655b9
2nd row1f76529f8574734946361c461b024d99
3rd rowa1a22d70485983eac12b5b88dad1cf95
4th row6b849c168bdad6f867558c3793159a81
5th rowa41d7549acf90789359a9aa5346dcb46

Common Values

ValueCountFrequency (%)
1d8659b241ea2a332a6b5fec6b954efd1
 
< 0.1%
d45f27d0e3ddbd615980f3e7b00cfede1
 
< 0.1%
ce96501e0654431b16afaa57e76dba881
 
< 0.1%
e23492fd89c63e769103d46315d9981c1
 
< 0.1%
8d001c66a1fb40b9e1466fb2ee79efa41
 
< 0.1%
924eb05bd62271fd4a9b7fcf8cc065af1
 
< 0.1%
3b82a45e3748cf8824683e9f8120a8ee1
 
< 0.1%
315931cad67e4839177e1e5a41c1483d1
 
< 0.1%
5d69f9c0776424163cef81df580824521
 
< 0.1%
a58660faa5b2fc3559a1545eb688670d1
 
< 0.1%
Other values (1296665)1296665
> 99.9%

Length

2021-09-15T13:01:40.251662image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1d8659b241ea2a332a6b5fec6b954efd1
 
< 0.1%
d45f27d0e3ddbd615980f3e7b00cfede1
 
< 0.1%
ce96501e0654431b16afaa57e76dba881
 
< 0.1%
e23492fd89c63e769103d46315d9981c1
 
< 0.1%
8d001c66a1fb40b9e1466fb2ee79efa41
 
< 0.1%
924eb05bd62271fd4a9b7fcf8cc065af1
 
< 0.1%
3b82a45e3748cf8824683e9f8120a8ee1
 
< 0.1%
315931cad67e4839177e1e5a41c1483d1
 
< 0.1%
5d69f9c0776424163cef81df580824521
 
< 0.1%
a58660faa5b2fc3559a1545eb688670d1
 
< 0.1%
Other values (1296665)1296665
> 99.9%

Most occurring characters

ValueCountFrequency (%)
22596593
 
6.3%
92596375
 
6.3%
72595084
 
6.3%
42594676
 
6.3%
a2594103
 
6.3%
d2593816
 
6.3%
32593713
 
6.3%
f2593666
 
6.3%
52593098
 
6.2%
e2592759
 
6.2%
Other values (6)15549717
37.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number25937791
62.5%
Lowercase Letter15555809
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
22596593
10.0%
92596375
10.0%
72595084
10.0%
42594676
10.0%
32593713
10.0%
52593098
10.0%
12592577
10.0%
82592342
10.0%
02591678
10.0%
62591655
10.0%
Lowercase Letter
ValueCountFrequency (%)
a2594103
16.7%
d2593816
16.7%
f2593666
16.7%
e2592759
16.7%
c2592326
16.7%
b2589139
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common25937791
62.5%
Latin15555809
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
22596593
10.0%
92596375
10.0%
72595084
10.0%
42594676
10.0%
32593713
10.0%
52593098
10.0%
12592577
10.0%
82592342
10.0%
02591678
10.0%
62591655
10.0%
Latin
ValueCountFrequency (%)
a2594103
16.7%
d2593816
16.7%
f2593666
16.7%
e2592759
16.7%
c2592326
16.7%
b2589139
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII41493600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22596593
 
6.3%
92596375
 
6.3%
72595084
 
6.3%
42594676
 
6.3%
a2594103
 
6.3%
d2593816
 
6.3%
32593713
 
6.3%
f2593666
 
6.3%
52593098
 
6.2%
e2592759
 
6.2%
Other values (6)15549717
37.5%

unix_time
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1274823
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1349243637
Minimum1325376018
Maximum1371816817
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:40.400938image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1325376018
5-th percentile1328671975
Q11338750742
median1349249747
Q31359385376
95-th percentile1369830595
Maximum1371816817
Range46440799
Interquartile range (IQR)20634633

Descriptive statistics

Standard deviation12841278.42
Coefficient of variation (CV)0.009517390391
Kurtosis-1.087540501
Mean1349243637
Median Absolute Deviation (MAD)10358807
Skewness0.003377949757
Sum1.749530493 × 1015
Variance1.648984315 × 1014
MonotonicityIncreasing
2021-09-15T13:01:40.811874image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13351105214
 
< 0.1%
13700506674
 
< 0.1%
13701772274
 
< 0.1%
13568903883
 
< 0.1%
13716586033
 
< 0.1%
13296758133
 
< 0.1%
13465385763
 
< 0.1%
13555921673
 
< 0.1%
13393584923
 
< 0.1%
13565424943
 
< 0.1%
Other values (1274813)1296642
> 99.9%
ValueCountFrequency (%)
13253760181
< 0.1%
13253760441
< 0.1%
13253760511
< 0.1%
13253760761
< 0.1%
13253761861
< 0.1%
13253762481
< 0.1%
13253762821
< 0.1%
13253763081
< 0.1%
13253763181
< 0.1%
13253763611
< 0.1%
ValueCountFrequency (%)
13718168171
< 0.1%
13718168161
< 0.1%
13718167521
< 0.1%
13718167391
< 0.1%
13718167281
< 0.1%
13718166961
< 0.1%
13718166831
< 0.1%
13718166561
< 0.1%
13718165621
< 0.1%
13718165221
< 0.1%

merch_lat
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1247805
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.53733804
Minimum19.027785
Maximum67.510267
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 MiB
2021-09-15T13:01:40.985898image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum19.027785
5-th percentile29.7516534
Q134.733572
median39.36568
Q341.957164
95-th percentile46.0035301
Maximum67.510267
Range48.482482
Interquartile range (IQR)7.223592

Descriptive statistics

Standard deviation5.10978837
Coefficient of variation (CV)0.1325931844
Kurtosis0.79599391
Mean38.53733804
Median Absolute Deviation (MAD)3.397536
Skewness-0.1819154297
Sum49970402.81
Variance26.10993718
MonotonicityNot monotonic
2021-09-15T13:01:41.147063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.5570264
 
< 0.1%
38.8950984
 
< 0.1%
41.3016114
 
< 0.1%
38.7140964
 
< 0.1%
42.0119514
 
< 0.1%
38.6865124
 
< 0.1%
40.5501994
 
< 0.1%
41.2125064
 
< 0.1%
43.3730764
 
< 0.1%
32.644694
 
< 0.1%
Other values (1247795)1296635
> 99.9%
ValueCountFrequency (%)
19.0277851
< 0.1%
19.0278041
< 0.1%
19.0297981
< 0.1%
19.0312421
< 0.1%
19.0322771
< 0.1%
19.0332881
< 0.1%
19.0342821
< 0.1%
19.0346871
< 0.1%
19.0354721
< 0.1%
19.0363121
< 0.1%
ValueCountFrequency (%)
67.5102671
< 0.1%
67.4415181
< 0.1%
67.3970181
< 0.1%
67.1881111
< 0.1%
67.0642771
< 0.1%
66.8351741
< 0.1%
66.6829051
< 0.1%
66.673551
< 0.1%
66.6646731
< 0.1%
66.6592421
< 0.1%

merch_long
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1275745
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.2264648
Minimum-166.671242
Maximum-66.950902
Zeros0
Zeros (%)0.0%
Negative1296675
Negative (%)100.0%
Memory size9.9 MiB
2021-09-15T13:01:41.326586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-166.671242
5-th percentile-119.3300916
Q1-96.8972755
median-87.438392
Q3-80.2367965
95-th percentile-73.3542179
Maximum-66.950902
Range99.72034
Interquartile range (IQR)16.660479

Descriptive statistics

Standard deviation13.77109056
Coefficient of variation (CV)-0.1526280631
Kurtosis1.848479176
Mean-90.2264648
Median Absolute Deviation (MAD)8.227889
Skewness-1.146959945
Sum-116994401.2
Variance189.6429353
MonotonicityNot monotonic
2021-09-15T13:01:41.488185image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-81.2191894
 
< 0.1%
-74.6182694
 
< 0.1%
-87.1164144
 
< 0.1%
-91.0518443
 
< 0.1%
-91.6993523
 
< 0.1%
-80.2864053
 
< 0.1%
-80.2914333
 
< 0.1%
-96.5117633
 
< 0.1%
-80.41793
 
< 0.1%
-89.8711113
 
< 0.1%
Other values (1275735)1296642
> 99.9%
ValueCountFrequency (%)
-166.6712421
< 0.1%
-166.6701321
< 0.1%
-166.6696381
< 0.1%
-166.6661791
< 0.1%
-166.6648281
< 0.1%
-166.6628881
< 0.1%
-166.6619681
< 0.1%
-166.6592771
< 0.1%
-166.6578341
< 0.1%
-166.6571741
< 0.1%
ValueCountFrequency (%)
-66.9509021
< 0.1%
-66.9559961
< 0.1%
-66.956541
< 0.1%
-66.9586591
< 0.1%
-66.9587511
< 0.1%
-66.9591781
< 0.1%
-66.9619231
< 0.1%
-66.9629131
< 0.1%
-66.9639181
< 0.1%
-66.9639751
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 MiB
0
1289169 
1
 
7506

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1296675
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Length

2021-09-15T13:01:41.807860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-15T13:01:41.932561image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Most occurring characters

ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1296675
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common1296675
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1296675
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01289169
99.4%
17506
 
0.6%

Interactions

2021-09-15T13:00:19.696190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:20.412516image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:21.028618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:21.664402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:22.311069image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:23.356634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:23.941914image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:24.547381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:25.193366image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:25.817210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:26.405193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:26.990309image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:27.565614image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:28.181618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:28.797474image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:29.413200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:29.988812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:30.584451image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:31.210074image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:31.825895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:32.399902image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:32.995626image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:33.609271image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:34.248602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:34.867944image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:35.485296image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:36.071253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:36.681621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:37.299967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:37.909340image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:38.612387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:39.372893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:40.081002image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:40.777139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:41.476940image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:42.220981image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:42.926066image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:43.699998image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:44.405721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:45.129785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:45.843879image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:46.482168image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:47.093533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:47.737028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:48.503980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:49.135292image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:49.727740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:50.388345image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:51.082525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:51.756552image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:52.383991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:53.171412image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:53.823669image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:54.490852image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:55.121204image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:55.872910image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:56.476809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:57.081399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:57.703432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:58.350793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:58.964179image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:00:59.660273image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:00.366385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:01.021900image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:01.666729image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:02.301992image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:02.899843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:03.620367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:04.339654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:04.988759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:05.604260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:06.200671image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:06.803915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:07.438768image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:08.097166image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:08.717332image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:09.309855image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:09.892198image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:10.520483image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:11.154113image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:11.757830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:12.363293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:13.073975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:13.717953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:14.346223image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:14.966672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:15.546785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:16.149096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:16.770582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:17.400741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:17.994301image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:18.599711image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:19.189297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:19.820620image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:20.446798image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:21.198193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:21.826795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:22.492817image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:23.135088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-15T13:01:23.739795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-15T13:01:42.045228image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-15T13:01:42.346024image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-15T13:01:42.643198image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-15T13:01:42.955399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-15T13:01:43.319421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-15T13:01:24.604327image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-15T13:01:27.084703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
002019-01-01 00:00:182703186189652095fraud_Rippin, Kub and Mannmisc_net4.97JenniferBanksF561 Perry CoveMoravian FallsNC2865436.0788-81.17813495Psychologist, counselling1988-03-090b242abb623afc578575680df30655b9132537601836.011293-82.0483150
112019-01-01 00:00:44630423337322fraud_Heller, Gutmann and Ziemegrocery_pos107.23StephanieGillF43039 Riley Greens Suite 393OrientWA9916048.8878-118.2105149Special educational needs teacher1978-06-211f76529f8574734946361c461b024d99132537604449.159047-118.1864620
222019-01-01 00:00:5138859492057661fraud_Lind-Buckridgeentertainment220.11EdwardSanchezM594 White Dale Suite 530Malad CityID8325242.1808-112.26204154Nature conservation officer1962-01-19a1a22d70485983eac12b5b88dad1cf95132537605143.150704-112.1544810
332019-01-01 00:01:163534093764340240fraud_Kutch, Hermiston and Farrellgas_transport45.00JeremyWhiteM9443 Cynthia Court Apt. 038BoulderMT5963246.2306-112.11381939Patent attorney1967-01-126b849c168bdad6f867558c3793159a81132537607647.034331-112.5610710
442019-01-01 00:03:06375534208663984fraud_Keeling-Cristmisc_pos41.96TylerGarciaM408 Bradley RestDoe HillVA2443338.4207-79.462999Dance movement psychotherapist1986-03-28a41d7549acf90789359a9aa5346dcb46132537618638.674999-78.6324590
552019-01-01 00:04:084767265376804500fraud_Stroman, Hudson and Erdmangas_transport94.63JenniferConnerF4655 David IslandDublinPA1891740.3750-75.20452158Transport planner1961-06-19189a841a0a8ba03058526bcfe566aab5132537624840.653382-76.1526670
662019-01-01 00:04:4230074693890476fraud_Rowe-Vandervortgrocery_net44.54KelseyRichardsF889 Sarah Station Suite 624HolcombKS6785137.9931-100.98932691Arboriculturist1993-08-1683ec1cc84142af6e2acf10c44949e720132537628237.162705-100.1533700
772019-01-01 00:05:086011360759745864fraud_Corwin-Collinsgas_transport71.65StevenWilliamsM231 Flores Pass Suite 720EdinburgVA2282438.8432-78.60036018Designer, multimedia1947-08-216d294ed2cc447d2c71c7171a3d54967c132537630838.948089-78.5402960
882019-01-01 00:05:184922710831011201fraud_Herzog Ltdmisc_pos4.27HeatherChaseF6888 Hicks Stream Suite 954ManorPA1566540.3359-79.66071472Public affairs consultant1941-03-07fc28024ce480f8ef21a32d64c93a29f5132537631840.351813-79.9581460
992019-01-01 00:06:012720830304681674fraud_Schoen, Kuphal and Nitzschegrocery_pos198.39MelissaAguilarF21326 Taylor Squares Suite 708ClarksvilleTN3704036.5220-87.3490151785Pathologist1974-03-283b9014ea8fb80bd65de0b1463b00b00e132537636137.179198-87.4853810

Last rows

Unnamed: 0trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
129666512966652020-06-21 12:08:42213193596103206fraud_Gulgowski LLChome72.17JamesHuntM7369 Gabriel TunnelPointe Aux PinsMI4977545.7549-84.447095Electrical engineer1994-02-09108c103b26f686c24c021aaf4210977e137181652244.938461-83.9962340
129666612966662020-06-21 12:09:224587657402165341815fraud_Hyatt, Russel and Gleichnerhealth_fitness7.30AmberLewisF6296 John Keys Suite 858Pembroke TownshipIL6095841.0646-87.59172135Psychotherapist, child2004-05-0837a18c6fb0c5c722b6339ffedc82f55a137181656240.556811-88.0923390
129666712966672020-06-21 12:10:564822367783500458fraud_Hahn, Douglas and Schowaltertravel19.71ChristopherFarrellM97070 Anderson LandHaines CityFL3384428.0758-81.592933804Exercise physiologist1991-01-0134e72e0a659a6c8f4a20ee65594f3a7d137181665627.465871-81.5118040
129666812966682020-06-21 12:11:23213141712584544fraud_Metz, Russel and Metzkids_pets100.85MargaretCurtisF742 Oneill ShoreFlorenceMS3907332.1530-90.121719685Fine artist1984-12-240d86d8c17638d7eff77db9c6a878b477137181668331.377697-90.5284500
129666912966692020-06-21 12:11:364400011257587661852fraud_Stiedemann Incmisc_pos37.38MarissaPowellF474 Allen HavenNorth LoupNE6885941.4972-98.7858509Nurse, children's1980-09-159a7ea2625cf8303efe34e3c09546868f137181669641.728638-99.0396600
129667012966702020-06-21 12:12:0830263540414123fraud_Reichel Incentertainment15.56ErikPattersonM162 Jessica Row Apt. 072HatchUT8473537.7175-112.4777258Geoscientist1961-11-24440b587732da4dc1a6395aba5fb41669137181672836.841266-111.6907650
129667112966712020-06-21 12:12:196011149206456997fraud_Abernathy and Sonsfood_dining51.70JeffreyWhiteM8617 Holmes Terrace Suite 651TuscaroraMD2179039.2667-77.5101100Production assistant, television1979-12-11278000d2e0d2277d1de2f890067dcc0a137181673938.906881-78.2465280
129667212966722020-06-21 12:12:323514865930894695fraud_Stiedemann Ltdfood_dining105.93ChristopherCastanedaM1632 Cohen Drive Suite 639High Rolls Mountain ParkNM8832532.9396-105.8189899Naval architect1967-08-30483f52fe67fabef353d552c1e662974c137181675233.619513-105.1305290
129667312966732020-06-21 12:13:362720012583106919fraud_Reinger, Weissnat and Strosinfood_dining74.90JosephMurrayM42933 Ryan UnderpassMandersonSD5775643.3526-102.54111126Volunteer coordinator1980-08-18d667cdcbadaaed3da3f4020e83591c83137181681642.788940-103.2411600
129667412966742020-06-21 12:13:374292902571056973207fraud_Langosh, Wintheiser and Hyattfood_dining4.30JeffreySmithM135 Joseph MountainsSulaMT5987145.8433-113.8748218Therapist, horticultural1995-08-168f7c8e4ab7f25875d753b422917c98c9137181681746.565983-114.1861100